High-Performance Floating Point Divide
نویسندگان
چکیده
In modern processors floating point divide operations often take 20 to 25 clock cycles, five times that of multiplication. Typically multiplicative algorithms with quadratic convergence are used for high-performance divide. A divide unit based on the multiplicative Newton-Raphson iteration is proposed. This divide unit utilizes the higher-order Newton-Raphson reciprocal approximation to compute the quotient fast, efficiently and with high throughput. The divide unit achieves fast execution by computing the square, cube and higher powers of the approximation directly and much faster than the traditional approach with serial multiplications. Additionally, the second, third, and higher-order terms are computed simultaneously further reducing the divide latency. Significant hardware reductions have been identified that reduce the overall computation significantly and therefore, reduce the area required for implementation and the power consumed by the computation. The proposed hardware unit is designed to achieve the desired quotient precision in a single iteration allowing the unit to be fully pipelined for maximum throughput.
منابع مشابه
Variable Precision Floating-Point Divide and Square Root for Efficient FPGA Implementation of Image and Signal Processing Algorithms
Field Programmable Gate Arrays (FPGAs) are frequently used to accelerate signal and image processing algorithms due to their flexibility, relatively low cost, high performance and fast time to market. For those applications where the data has large dynamic range, floating-point arithmetic is desirable due to the inherent limitations of fixed-point arithmetic. Moreover, optimal reconfigurable ha...
متن کاملProving the IEEE Correctness of Iterative Floating-Point Square Root, Divide, and Remainder Algorithms
The work presented in this paper was initiated as part of a study on software alternatives to the hardware implementations of floating-point operations such as divide and square root. The results of the study proved the viability of software implementations, and showed that certain proposed algorithms are comparable in performance to current hardware implementations. This paper discusses two co...
متن کاملThe IBM eServer z990 floating-point unit
z990 floatingpoint unit G. Gerwig H. Wetter E. M. Schwarz J. Haess C. A. Krygowski B. M. Fleischer M. Kroener The floating-point unit (FPU) of the IBM z990 eServer is the first one in an IBM mainframe with a fused multiply-add dataflow. It also represents the first time that an SRT divide algorithm (named after Sweeney, Robertson, and Tocher, who independently proposed the algorithm) was used i...
متن کاملAn Overview of Floating-Point Support and Math Library on the Intel XScaleTM Architecture
New microprocessor architectures often require software support for basic arithmetic operations such as divide, or square root. The Intel R XScale processor, designed for low power mobile devices, provides no hardware support for floating-point. We show that an efficient software implementation of the basic operations and math library routines can achieve competitive performance, and effectivel...
متن کاملIA-64 Floating-Point Operations and the IEEE Standard for Binary Floating-Point Arithmetic
This paper examines the implementation of floating-point operations in the IA-64 architecture from the perspective of the IEEE Standard for Binary Floating-Point Arithmetic [1]. The floating-point data formats, operations, and special values are compared with the mandatory or recommended ones from the IEEE Standard, showing the potential gains in performance that result from specific choices. T...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001